Post - Inference Methods for Scalable Probabilistic Modeling

نویسندگان

  • Willie Neiswanger
  • Ryan P. Adams
  • Ruslan Salakhutdinov
  • Jeff Schneider
  • Yee Whye Teh
  • Eric P. Xing
چکیده

Post-Inference Methods for Scalable Probabilistic Modeling by Willie Neiswanger This thesis focuses on post-inference methods, which are procedures that can be applied after the completion of standard inference algorithms to allow for increased efficiency, accuracy, or parallelism when learning probabilistic models of big data sets. These methods also aim to allow for efficient computation given distributed or streaming data, and given models that incorporate complex prior information. A few examples include: ˆ Embarrassingly parallel inference. Large data sets are often distributed over a collection of machines. We first compute an inference result (e.g. with Markov chain Monte Carlo or variational inference) on each machine, in parallel, without communication between machines. Afterwards, we combine the results to yield an inference result for the full data set. ˆ Prior swapping. Certain model priors limit the number of applicable inference algorithms, or increase their computational cost. We first choose any “convenient prior” (e.g. a conjugate prior, or a prior that allows for computationally cheap inference), and compute an inference result. Afterwards, we use this result to efficiently perform inference with other, more sophisticated priors or regularizers. ˆ Local posterior revisions. After inferring an approximate posterior density via some inference algorithm, we may want to reduce the error of this approximation in certain regions of the parameter space to increase the accuracy of posterior expectations or performance of post-inference methods. We develop efficient methods for local revisions of posterior approximations. We also describe the benefits of combining the above methods, present methodology for applying the embarrassingly parallel procedures when the number of machines is dynamic or unknown at inference time, develop randomized algorithms for efficient application of post-inference methods in distributed environments, show ways to optimize these methods by incorporating test-functions of interest, and demonstrate how these methods can be implemented in probabilistic programming frameworks for automatic deployment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Statistical Relational Learning for NLP

Prerequisites:​ No prior knowledge of statistical relational learning is required. Abstract: Statistical Relational Learning (SRL) is an interdisciplinary research area that combines first­order logic and machine learning methods for probabilistic inference. Although many Natural Language Processing (NLP) tasks (including text classification, semantic parsing, information extraction, coreferenc...

متن کامل

Augur: Data-Parallel Probabilistic Modeling

Implementing inference procedures for each new probabilistic model is timeconsuming and error-prone. Probabilistic programming addresses this problem by allowing a user to specify the model and then automatically generating the inference procedure. To make this practical it is important to generate high performance inference code. In turn, on modern architectures, high performance requires para...

متن کامل

Augur: a Modeling Language for Data-Parallel Probabilistic Inference

It is time-consuming and error-prone to implement inference procedures for each new probabilistic model. Probabilistic programming addresses this problem by allowing a user to specify the model and having a compiler automatically generate an inference procedure for it. For this approach to be practical, it is important to generate inference code that has reasonable performance. In this paper, w...

متن کامل

Stochastic Inference for Scalable Probabilistic Modeling of Binary Matrices

Fully observed large binary matrices appear in a wide variety of contexts. To model them, probabilistic matrix factorization (PMF) methods are an attractive solution. However, current batch algorithms for PMF can be inefficient because they need to analyze the entire data matrix before producing any parameter updates. We derive an efficient stochastic inference algorithm for PMF models of fully...

متن کامل

Learning to Reason with a Scalable Probabilistic Logic

Learning to reason and understand the world’s knowledge is a fundamental problem in Artificial Intelligence (AI). Traditional symbolic AI methods were popular in the 1980s, when first-order logic rules were mostly handwritten, and reasoning algorithms were built on top of them. In the 90s, more and more researchers became interested in statistical methods that deal with the uncertainty of the d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018